150 research outputs found

    Goal-oriented top-down probabilistic visual attention model for recognition of manipulated objects in egocentric videos

    Get PDF
    We propose a new top down probabilistic saliency model for egocentric video content. It aims to predict top-down visual attention maps focused on manipulated objects, that are then used for psycho-visual weighting of features in the problem of manipulated object recognition. The model is probabilistically defined using both global and local appearance features extracted from automatically segmented arm areas and objects. A psycho-visual experiment has been conducted in a guided framework that compares our proposal and other popular state-of-the-art models with respect to human gaze fixations. The obtained results show that our approach outperforms several popular bottom-up saliency approaches in a well-known egocentric dataset Furthermore, an additional task-driven assessment for object recognition in egocentric video reveals that the proposed method improves the performance of several state-of-the-art techniques for object detection

    Fine-Grained Action Detection and Classification in Table Tennis with Siamese Spatio-Temporal Convolutional Neural Network

    Get PDF
    International audienc

    Multi-Layer Local Graph Words for Object Recognition

    Full text link
    In this paper, we propose a new multi-layer structural approach for the task of object based image retrieval. In our work we tackle the problem of structural organization of local features. The structural features we propose are nested multi-layered local graphs built upon sets of SURF feature points with Delaunay triangulation. A Bag-of-Visual-Words (BoVW) framework is applied on these graphs, giving birth to a Bag-of-Graph-Words representation. The multi-layer nature of the descriptors consists in scaling from trivial Delaunay graphs - isolated feature points - by increasing the number of nodes layer by layer up to graphs with maximal number of nodes. For each layer of graphs its own visual dictionary is built. The experiments conducted on the SIVAL and Caltech-101 data sets reveal that the graph features at different layers exhibit complementary performances on the same content and perform better than baseline BoVW approach. The combination of all existing layers, yields significant improvement of the object recognition performance compared to single level approaches.Comment: International Conference on MultiMedia Modeling, Klagenfurt : Autriche (2012

    Video event detection and visual data pro cessing for multimedia applications

    Get PDF
    Cette thèse (i) décrit une procédure automatique pour estimer la condition d'arrêt des méthodes de déconvolution itératives basées sur un critère d'orthogonalité du signal estimé et de son gradient à une itération donnée; (ii) présente une méthode qui décompose l'image en une partie géométrique (ou "cartoon") et une partie "texture" en utilisation une estimation de paramètre et une condition d'arrêt basées sur la diffusion anisotropique avec orthogonalité, en utilisant le fait que ces deux composantes. "cartoon" et "texture", doivent être indépendantes; (iii) décrit une méthode pour extraire d'une séquence vidéo obtenue à partir de caméra portable les objets de premier plan en mouvement. Cette méthode augmente la compensation de mouvement de la caméra par une nouvelle estimation basée noyau de la fonction de probabilité de densité des pixels d'arrière-plan. Les méthodes présentées ont été testées et comparées aux algorithmes de l'état de l'art.This dissertation (i) describes an automatic procedure for estimating the stopping condition of non-regularized iterative deconvolution methods based on an orthogonality criterion of the estimated signal and its gradient at a given iteration; (ii) presents a decomposition method that splits the image into geometric (or cartoon) and texture parts using anisotropic diffusion with orthogonality based parameter estimation and stopping condition, utilizing the theory that the cartoon and the texture components of an image should be independent of each other; (iii) describes a method for moving foreground object extraction in sequences taken by wearable camera, with strong motion, where the camera motion compensated frame differencing is enhanced with a novel kernel-based estimation of the probability density function of the background pixels. The presented methods have been thoroughly tested and compared to other similar algorithms from the state-of-the-art.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF
    • …
    corecore